Model Selection

Low-latency Inference

# Low-latency Inference

Arch Router 1.5B.gguf

Arch-Router is a preference-aligned routing framework model with 1.5B parameters, used to map queries to domain-operation preferences for model routing decisions.

Large Language Model

Transformers English

DMind-1 is a Web3 expert model built upon Qwen3-32B, optimized for the Web3 ecosystem through supervised instruction fine-tuning and human feedback reinforcement learning, achieving significant improvements in task accuracy, content safety, and expert-level interaction alignment.

Large Language Model

Transformers Supports Multiple Languages

TreeHop is a lightweight embedding-level framework designed for efficient query embedding generation and filtering in multi-hop QA, significantly reducing computational overhead.

Question Answering System

Distil Large V3.5 Ct2

Distil-Whisper is a distilled version of the Whisper model, achieving efficient speech recognition through large-scale pseudo-labeling technology

Speech Recognition English

Canary 180m Flash

NVIDIA NeMo Canary Flash is a multilingual multitask speech model supporting automatic speech recognition and translation tasks in English, German, French, and Spanish.

Speech Recognition Supports Multiple Languages

Qwen2.5 VL 3B Instruct FP8 Dynamic

The FP8 quantized version of Qwen2.5-VL-3B-Instruct, supporting visual-text input and text output, and optimizing inference efficiency.

Transformers English

Mistral Small 24B Instruct 2501 AWQ

Mistral Small 3 (version 2501) is a 24B-parameter instruction-tuned large language model that sets a new benchmark in the sub-70B parameter category, featuring exceptional knowledge density and multilingual support capabilities.

Large Language Model

Transformers Supports Multiple Languages

A lightweight Counter-Strike 2 player detection model based on YOLOv11, suitable for real-time object detection scenarios

Object Detection

Mxbai Rerank Base V1

This is a Transformer-based Reranker model primarily used for information retrieval and search result optimization tasks.

Transformers English

Ja Cascaded S2t Translation

This is a Japanese speech-to-any-target-language text translation pipeline based on a cascaded approach, consisting of automatic speech recognition (ASR) and text translation components.

Speech Recognition

Kotoba Whisper V2.1

Kotoba-Whisper-v2.1 is a Japanese automatic speech recognition (ASR) model based on Whisper, integrating an additional post-processing stack that automatically adds punctuation marks.

Speech Recognition

Transformers Japanese

This is a Transformers-based Text-to-Speech (TTS) model capable of converting input text into natural speech output.

Speech Synthesis

Mobileclip S1 OpenCLIP

MobileCLIP-S1 is an efficient image-text model that achieves fast zero-shot image classification through multimodal reinforcement training.

Sew Ft Fake Detection

This model is a fine-tuned audio classification model based on asapp/sew-mid-100k on the alexandreacff/kaggle-fake-detection dataset, designed for fake audio detection.

Audio Classification

Transformers Other

Counter-Strike 2 (CS2) player object detection model based on YOLOv9 architecture, capable of recognizing player characters in the game

Object Detection

Vitpose Base Simple

This is a keypoint detection model based on transformers, used to identify keypoint positions in images

Pose Estimation

Mixtral 8x22B Instruct V0.1

Mixtral-8x22B-Instruct-v0.1 is a large language model fine-tuned for instructions based on Mixtral-8x22B-v0.1, supporting multiple languages and function calling capabilities.

Large Language Model

Transformers Supports Multiple Languages

Faster Whisper Large V3 Ja

Japanese-optimized version based on OpenAI Whisper large-v3, supporting multilingual speech recognition

Speech Recognition Supports Multiple Languages

Faster Whisper Large V2

Whisper large-v2 is a large-scale automatic speech recognition (ASR) model developed by OpenAI, supporting multilingual speech-to-text tasks.

Speech Recognition Supports Multiple Languages

This is an audio conversion model based on RVC (Retrieval-based Voice Conversion) technology, capable of transforming input audio into speech with a specific style.

Speech Synthesis

This is an RVC (Retrieval-Based Voice Conversion) model designed for audio-to-audio conversion tasks.

Speech Synthesis

Sonic48k is an audio-to-audio model based on RVC (Retrieval-based Voice Conversion) technology, primarily used for voice conversion tasks.

Speech Synthesis

This is an audio conversion model based on RVC (Retrieval-Based Voice Conversion) technology, capable of converting input audio into a specific character's voice.

Speech Synthesis

Mileycyrus2333333

This is a voice conversion model based on RVC (Retrieval-based Voice Conversion) technology, capable of converting input audio into a specific style of speech.

Speech Synthesis

Legocitynarrator

This is an RVC (Retrieval-Based Voice Conversion) model designed for audio-to-audio tasks, particularly suited for converting speech into the Lego City narrator style.

Speech Synthesis

This is an audio conversion model based on RVC (Retrieval-based Voice Conversion) technology, capable of converting input audio into the voice of Jesse Pinkman.

Speech Synthesis

This is a voice conversion model based on RVC (Retrieval-Based Voice Conversion) technology, capable of converting input audio into a specific style of voice output.

Speech Synthesis

Faster Whisper Large V2 Japanese 5k Steps

A Japanese automatic speech recognition (ASR) model based on Whisper Large V2, optimized with CTranslate2 for efficient inference.

Speech Recognition

Transformers Japanese

Extractive Question Answering Not Evaluated

This model is a DistilBERT model fine-tuned on the SQuAD dataset for extractive question answering tasks, with high exact match rate and F1 score.

Question Answering System

LeViT-256 is an efficient vision model based on Transformer architecture, designed for fast inference and pretrained on the ImageNet-1k dataset.

Image Classification

Mobilevit Small

MobileViT is a lightweight, low-latency vision Transformer model that combines the strengths of CNNs and Transformers, making it suitable for mobile devices.

Image Classification

Mobilevit Small

MobileViT is a lightweight, low-latency vision Transformer model that combines the advantages of CNNs and Transformers, suitable for mobile devices.

Image Classification

Sbert Chinese Qmc Finance V1 Distill

A lightweight sentence similarity model optimized for financial domain question matching, compressing 12-layer BERT to 4 layers through distillation technology, significantly improving inference efficiency

Ms Marco TinyBERT L6

A cross-encoder model trained on the MS Marco passage ranking task, suitable for query-passage relevance scoring in information retrieval scenarios.

Text Embedding English

A versatile large language model capable of handling various natural language processing tasks (inferred information)

Large Language Model

Distilbart Xsum 12 3

DistilBART is a distilled version of the BART model, specifically optimized for summarization tasks, significantly reducing model parameters and inference time while maintaining high performance.

Text Generation English

Klue Bert Base Mrc

A Korean extractive Q&A model based on KLUE-BERT-base, specifically designed for Korean machine reading comprehension tasks.

Question Answering System

Transformers Korean

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase